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Longitudinal studies are often conducted to explore the cohort 
and age effects in many scientific areas. The within cluster correlation 
structure plays a very important role in longitudinal data analysis. 
This is because not only can an estimator be improved by incorporat- 
ing the within cluster correlation structure into the estimation proce- 
dure, but also the within cluster correlation structure can sometimes 
provide valuable insights in practical problems. For example, it can 
reveal the correlation strengths among the impacts of various factors. 
Motivated by data typified by a set from Bangladesh pertinent to the 
use of contraceptives, we propose a random effect varying-coefficient 
model, and an estimation procedure for the within cluster correla- 
tion structure of the proposed model. The estimation procedure is 
optimization-free and the proposed estimators enjoy asymptotic nor- 
mality under mild conditions. Simulations suggest that the proposed 
estimation is practicable for finite samples and resistent against mild 
forms of model misspecification. Finally, we analyze the data men- 
tioned above with the new random effect varying-coefficient model 
together with the proposed estimation procedure, which reveals some 
interesting sociological dynamics. 



1. Introduction. 

1.1. Theoretical background. Longitudinal data analysis has attracted 
considerable attention from statisticians recently. The methodology for para- 
metric-based longitudinal data analysis is quite mature; see, for example, 
Diggle, Heagerty, Liang and Zeger [5] and the references therein. The situa- 
tion with nonparametric-based longitudinal data analysis is quite different. 
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One of the main difficulties is how to incorporate the within cluster correla- 
tion structure into the estimation procedure. For nonparametric longitudinal 
regression, see Zeger and Diggle [30], Hoover, Rice, Wu and Yang [15], Fan 
and Zhang [10], Lin and Carroll [19], Wu and Zhang [28], Fan and Li [8], 
Qu and Li [24] and others. He, Fung and Zhu [13] investigate the robust 
estimation in generalized partial linear models for longitudinal data. Lin 
and Carroll [19] recommend that we ignore the within cluster correlation 
when kernel smoothing is employed. Welsh, Lin and Carroll [26] investigate 
the possibility of using weighted least squares based on the within cluster 
correlation structure when spline smoothing is used. They suggest that the 
weighted least squares estimator works better than the estimator based on 
working independence when spline smoothing is used. Wang [25] provides 
an innovative kernel smoothing and demonstrates that, when the true cor- 
relation is available, her estimator is more efficient than the most efficient 
estimator that is obtained by adopting a working independence approach. 

In longitudinal data analysis, whether parametric or nonparametric, within 
cluster correlation structure can be used to improve the efficiency of the 
estimation. However, the within cluster correlation structure is typically un- 
known in reality. In this paper, we investigate systematically the estimation 
of the within cluster correlation structure. 

1.2. Practical meaning. The within cluster correlation structure can lead 
to not only important improvement of the estimation but also some practical 
insights. As we shall see in the analysis of the Bangladesh data, the estimated 
within cluster correlation structure actually sheds interesting light on the 
impacts among factors. 

The Bangladesh data set is from the Bangladesh Demographic and Health 
Survey 1996-1997. This survey follows a two-stage sample design in which 
clusters were selected at the first stage, and women were sampled from these 
clusters at the second stage. The clusters correspond to villages in rural areas 
and neighborhoods in urban areas, and may loosely be termed communities. 
What is of interest is how the factors which are commonly found to be 
associated with fertility in Bangladesh affect the first birth interval, and how 
strongly correlated are the effects of these factors. The selected factors are 
(1) age at first marriage; (2) woman's level of education; (3) type of region 
of residence; (4) woman's religion; (5) year of marriage; (6) administrative 
area. Among these factors, type of region of residence and administrative 
area pertain to cluster levels and are called cluster-level variables, and the 
rest are called individual-level variables. 

We use y to denote the length of the first birth interval, Z the vector of 
individual- level variables, ^ the vector of cluster-level variables and e the 
random effect. For j = l,...,nj, i = 1, . . . ,m, let yij and Zij be the jth 
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observation of y and Z in the ith cluster, the observation of ^ at the ith 
cluster, and the random effect of the ith cluster, which is unobservable. 

When examining the effects of year of marriage and other factors on the 
length of the first birth interval, it is necessary to take into account clustering 
of responses for women in the same community. This is because the first 
birth intervals for women in the same cluster may be correlated due to 
unobserved cluster-level factors such as cultural norms and access to family 
planning programes. The usual way to incorporate unobservable variables in 
a statistical model is via random effects. This leads to the multilevel model 

(1-1) Vij = Zfji^l + Gi) + ija2 + Sij- 

The coefficient ai + e can be regarded as the impact of Z on y, which 
is random across the clusters. The correlation matrix of e can reveal how 
strongly correlated are the impacts of the components of Z on y. In this 
paper, we propose an estimation procedure for the covariance matrix of e. 
Let 

Xij = {Zjj, ij)^ , a = {a^, aj)'^. 
Equation (1.1) can be written as 

(1.2) yij = Xlsi+Zlei + eij. 

Model (1.2) assumes that the impacts of the factors on the length of the first 
birth interval are time-invariant, which may not be plausible because as a 
society Bangladesh is changing with time. So, it is more realistic to assume 
that the impacts can vary with time. This leads to the following random 
effect varying-coefficient model: 

(1.3) yij=Xfja{Uij) + Zj^ei + Eij, j = 1, . . . , nj, i = 1, . . . , m, 

where Ejj , j = 1, . . . , rij , i = 1, . . . , m, are measurement errors, which we as- 
sume to be i.i.d. with E{eij) = and var(ejj) = cr^. Here e^, i = I, . . . ,m, 
are random effects across the clusters, which we assume to be i.i.d. with 
E{ei) = and cov(ej) = S. Further, is independent of Sij] Xij is a p- 
dimensional covariate and Zij is a g-dimensional covariate associated with 
random effects. We assume that Ui < N < oo. {Uij , Xij , Z^j) are i.i.d. and 
independent of and Eij. 

The within cluster correlation structure in (1.3) has been used extensively 
in the literature to model the cluster effect. See, for example. Laird and Ware 
[18], Jennrich and Schluchter [17], Longford [21], Zeger, Liang and Albert 
[31], Lindstrom and Bates [20], Hedeker and Gibbons [14] and others. The 
focus in the above cited was on the estimation relevant to the regressive 
coefficients in parametric models. Wu and Liang [27] proposed an interesting 
backfitting estimation procedure to estimate the functional coefficients in a 
time-varying-coefficient mixed effects model. 
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Varying-coefficient models are useful when exploring dynamic systems. 
There is a growing literature addressing this kind of model, which includes 
Hastie and Tibshirani [12], Xia and Li [29], Cai, Fan and Li [1], Zhang, Lee 
and Song [32], Fan, Yao and Cai [9] and the references therein. 

There is some literature discussing how to estimate the within cluster 
covariance matrix for the parametric setting. The commonly used method 
is restricted maximum likelihood estimation (REML); see Laird and Ware 
[18]. While REML is theoretically appealing, the optimization involved can 
be very difficult. Recently, Fan, Huang and Li [7] proposed an innovative 
semiparametric estimation procedure for the covariance structure in longitu- 
dinal studies. They investigated both the quasi-likelihood approach and the 
minimum generalized variance approach. In this paper, focusing on model 
(1.3), we take a different approach to studying the estimation of S and cr^ 
that includes both the methodology and relevant asymptotics. The estima- 
tors proposed in this paper have closed forms, so they are easy to implement 
without any need for optimization. Also, the proposed estimation method 
does not depend on the likelihood function, so it is robust against likelihood 
specification. We have established asymptotic normality of the proposed es- 
timators. We have also conducted extensive simulation studies, which show 
that when the dimension of ej is not larger than 3, REML encounters no seri- 
ous optimization problem, and in such cases REML and our estimators have 
comparable performance, with perhaps the latter being marginally better. 
On the other hand, when the dimension of e,, is larger than 3, we have not 
been able to find a convergent algorithm for REML. In this paper, we will 
also show that, when spline smoothing is used, the weighted least squares es- 
timators of the functional coefficients perform much better if we incorporate 
the within cluster correlation structure estimated by our proposed method 
instead of assuming working independence. 

Our paper is organized as follows. We begin in Section 2 with a description 
of an estimation procedure for S and a^. We present the asymptotic prop- 
erties of the proposed estimators in Section 3 and assess the performance 
of the model by simulation in Section 4. In Section 5, using the new model 
and the proposed estimation procedure, we explore how the impacts of the 
factors on the length of first birth interval in Bangladesh change with time 
and how strongly correlated are the impacts of the factors on the length of 
first birth interval. 

2. Estimation procedure. The procedure first estimates a(-) based on 
working independence, then uses the residual to estimate <t^, and finally 
estimates S. 

For any given u, we use a and a to denote a(n) and da.{u)/du, respectively. 
By Taylor's expansion, we have 

a(C/y) ~a + a([/ij -n) 
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when Uij is in a small neighborhood of u. This leads to the local least squares 
procedure 

m rii 

(2.1) Y.T.lyv-^Ii{^ + HU^J-n)}fKHiU^,-u), 
i=ij=i 

where Kh{-) = K{-/h)/h, K{-) is the kernel function and h is the bandwidth. 
We minimize (2.1) with respect to (a, b) to get the minimizer (a, b). The 
estimator of a is taken to be a. By simple calculations, we have 

(m \ ~^ m 

i=l ) i=\ 

where Ip is p x p identity matrix and Op is the p x p matrix with all entries 
being 0, 

Ai = (Xj, DjXj), Xj = (Xji, . . . , Xim)'^ , 
Di = diag([/ii - u, . . . , Uin^ - u), 

Wi = dia.g{Kh{Uii -u),.. .,Kh{Uin^ - u)), Yt = {ya,.. ■,yin,)^. 
Next, we estimate a^. Let kiUij) be a with u being replaced by Uij. Let 

!"« — (^jli • • • ) ^irii) ) '^ij — Vij -^ij'^i.^ij) ^ — (j'ily • • • i ^irii) ) 

'f'ij = Uij ~ ^Jj^iUij), Zj = (Zji, . . . , Zj^.)"^, Pj = Zj(Z^Zj) ^Z^. 

For each given i, based on the residual rj, we have the following synthetic 
linear model: 

(2.2) rj = Zjej + ej, Bi = {eh, . . . ,£in^) ■ 
The residual sum of squares of this linear model, 

rssj = rj{ln^ - Pi)ri, 

would be the raw material for estimating a^. The synthetic degrees of free- 
dom of rssj is Hi — q. Let RSSj be rssj with rj replaced by r^. RSSj is a 
natural estimator for rssj. Pooling all RSSj, i = 1, . . . ,m, together naturally 
leads to the estimator of o"^ as 

m m 
i=l 1=1 

Finally, we estimate S. From (2.2), we have the least squares estimator 
of Bi as 

e, = (zTZi)-iz;^r, = ei + {ZjZi)-^Zjei, 
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m m m m 

i=l i=l i=l 1=1 

m 

+ Y,eieJZ,{ZjZ,r\ 

i=l 

The last two terms are of order Op(m^/^), so they are negligible. This leads 
to 

m f m m ^ 

1=1 [i=l i=l J 

{m m ^ 

j:e,eJ-a'Y.i^Jzr'\. 
i=i 1=1 J 

So, we use 

m m 

(2.3) S = m-i^e,e^-m-^^2^(Z^Z,)-i 

i=l 1=1 

to estimate S. In (2.3), ej is ej with replaced by fj. 

3. Asymptotic properties. For any q x q symmetric matrix A, we use 
vech(74) to denote the vector consisting of all elements on and below the 
diagonal of the matrix A, vec(M) to denote the vector by simply stacking the 
column vectors of matrix M below one another, and let ci = limm^oo n/{n — 
qm) and C2 = \\mm^oon/m. Obviously there exists a unique q^ x q{q + l)/2 
matrix Rq such that vec(^) = i?qvech(^). 

To make the presentation more clear, we introduce the following notation. 
Set 

= j fK{t)dt, 2 = 0,1,2,3, 

m 

b=in-qm)-'^J2vIiIn,-Pi)Vi, 

i=l 

m 

B2 = m-^ Y.^ZjZi)-^Zjr^,r^JZ,{Z]Zi)~\ 

i=l 

Further, we write 

m 
i=l 



m 

B,=m-'Y.(^JZ,)-\ 



i=l 
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A2 = Jim^m"i^^[vec{(ZTz,)"^}vecT{(zTz,)~^}], 

m rii 

A3 = l^^rn-' E E E[vec{{ZjZiy^Zi,Z^^{ZjZi)~^}. 
i=i j=i 

X yec^{{ZjZ,)-'Z,,Z^yzJZ,r% 

m rii 

7 = Yi^Jn - qmy' ^ E ^[^(Zf Z,)-iZ,,f - cig/c2 + 1. 
""^"^ i=ij=i 

As the data are unbalanced, that is, different subjects have different numbers 
of observations, the above expectations take no simple forms. Moreover, let 



Ai 



where F^,.), S(^) {r = I, . . . ,q) denote the rth row of S, F, respectively, and 
ig) is the Kronecker product. 

Theorem 1. Under the technical conditions in the Appendix, we have 

,2x2 



n^^'^i, vec. 



,h(S - E) - -/i^ f {vech(S2) - vech(Si)6}l 

4 VAio/U2-/Ui/ J 

^ iV(0, (i?Jii,)-iiijAi?,(i?Jii,)-ic2), 

A = E{eieJ ® eiej"} - vec(S) vec'^(S) + cj^jS ® T + T ® S + Ai} 
+ 2cj4{A2 - A3 + ci/c2(l + 7)r} + var(e2j{A3 + ci/c27r}. 

It is clear from Theorem 1 that the estimator S would achieve root-n con- 
vergence rate if the bandwidth is properly selected, say the bandwidth h is 
taken to be 0{n~^^^). For the estimation of the regression function based on 
the within cluster correlation structure, Welsh, Lin and Carroll [26] suggest 
that the spline-based weighted least squares estimation with the right weight 
would have smaller variance than the working independence approach, but 
they do not take the bias into consideration. Bias and variance are equally 
important when assessing the goodness of an estimator. To appreciate both 
bias and variance, it is better to use the mean squared error as a criterion to 
assess the accuracy of an estimator. It is not clear whether Welsh, Lin and 
Carroll's estimator is more efficient than the working independence one in 
terms of the mean squared error. How to construct a good estimator of the 
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regression function is very important and interesting, but it lies beyond the 
scope of this paper. Also based on the within cluster correlation structure 
Wang [25] proposes an estimator of the regression function, which again has 
smaller variance than the working independence one. Both Wang's approach 
and Welsh, Lin and Carroll's rely on the within cluster correlation structure 
being known although in reality it is often unknown. Our estimator T, can 
be used to substitute the (unknown) within cluster correlation structure in 
their estimation procedure. This would not change the efficiency of the es- 
timator of the regression function because S enjoys convergence rate 
Further, the established asymptotic normality is also useful for statistical 
inference. 

Theorem 2. Under the technical conditions in the Appendix, we have 

-^iV(0, 2(7^(1 + 7)ci+var(e?i)7ci). 

Similarly to Theorem 1, if we choose the bandwidth h to be 0{n~^^^), 
the estimator o"^ will have the convergence rate 

We give the proofs of these two theorems in the Appendix. 

4. Simulation study. In this section, we conduct a simulation study on 
the efficacy of the proposed estimation method, and compare our results with 
restricted maximum likelihood estimation (REML), which is commonly used 
in the literature for the estimation of the within cluster correlation structure. 
We will also demonstrate that the proposed estimator of the covariance ma- 
trix can be used to improve the estimators of the functional coefficients, and 
the proposed estimator of the covariance matrix is robust against likelihood 
specification and mild model misspecification. 



Table 1 
The MSEs of the estimators 
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Our method 




0.1008 


0.1013 


0.0997 


0.0987 


0.0972 


0.0967 




0.0785 


0.0788 


0.0791 


0.0789 


0.0787 


0.0760 


cr22 


0.0857 


0.0874 


0.0889 


0.0890 


0.0899 


0.0887 




0.0787 


0.0461 


0.0242 


0.0154 


0.0095 


0.0081 



The top row is the number of knots. The last column is the MSE of the estimators obtained 
by our method. The rest are the MSE of the estimators obtained by REML at different 
numbers of knots. 
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Example. In (1.3), withp = 2, Xij are i.i.d. from A^(0,72), Uij are i.i.d. 
from U{0, 1) and Eij are i.i.d. from A^(0, cr^). With q = 2, Zij are i.i.d. from 
A^(0, 12) and are i.i.d. from A^(0, S). Next, ni are i.i.d. and set to be the 
integer part of \9\ + 6, ~ A'"(0,4). We set m equal to 100 and cr^ to 1. 
S = ((7jj)2x2i and (Til is set to be 2, ai2 to 1.5 and (T22 to 2. We also set 
ai([/) =sin(27rC/) and a2{U) =cos{2ttU). 

The kernel function involved in the local linear modeling is taken to be 
the Epanechnikov kernel K{t) = 0.75(1 — The bandwidth is chosen to 
be 0.15. We repeat the simulation 100 times; the mean squared error (MSE) 
is used to assess the accuracy of the estimators. The MSEs of the estimators 
of (Tjj and o"^ are presented in the last column in Table 1, which suggests 
that the proposed estimators perform well. 

Next, we compare the proposed estimation with REML. For the non- 
parametric setting, we use the B-spline decomposition to approximate the 
functional coefficient. The knots in the B-spline decomposition are equally 
spaced. We choose the range of number of knots where the REML performs 
best when we use REML to estimate the cjjj and c^. The Downhill Simplex 
approach (Jacoby, Kowalik and Pizzo [16]) is employed for the optimization 
involved in REML. The MSEs of the obtained estimators are presented in 
Table 1. 

From Table 1, we can see that the newly proposed method and REML 
have comparable performance. Given the fact that REML is a likelihood- 
based method fully utilizing the information provided by data, it should be, 
in theory, the most efficient as long as the likelihood function is correctly 
specified. However, in practice, REML may not be practicable because the 
optimization involved in REML can be problematic when the dimension of ej 
is larger than 3. Specifically, the Downhill Simplex method, which served us 
well previously, has a tendency of failing to converge in such cases. Although 
we cannot rule out the possibility of better optimization algorithms, the 
newly proposed estimation does have the considerable practical advantage 
of yielding a closed-form solution and being optimization-free. 

As one referee has rightly pointed out, another advantage of the proposed 
estimation over REML is that the former does not rely on the likelihood 
function, so it is robust against likelihood specification. To examine this 
point, we set the Si j 3jS 1.1. d. from a uniform distribution with mean and 
variance o"^. The proposed estimation is employed again to estimate the aij 
and a^. The MSEs of the obtained estimators are 0.006 for o"^, 0.090 for an, 
0.066 for (7i2 and 0.091 for a22- This suggests that the proposed estimation 
is indeed robust with respect to likelihood specification. 

As mentioned in the Introduction, the covariance matrix of random ef- 
fects serves two purposes. First, it improves the estimator of the functional 
coefficient. Lin and Carroll [19] have shown that the estimator based on 
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Table 2 

The Improvement of the estimators 
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12 


13 


14 


15 


1.47 


2.20 


2.62 


2.70 


2.71 


2.81 


2.74 


2.80 


2.80 


0.10 


0.19 


0.30 


0.45 


0.67 


0.94 


1.28 


1.59 


2.00 



The top row is the number of knots, the second row is the IMP for ai(-) and the third 
row is the IMP for ci2(-)- 



working independence would be the best when kernel smoothing is used in a 
nonparametric setting. Welsh, Lin and Carroll [26] have also shown the esti- 
mator can be improved by the weighted least squares approach when spline 
smoothing is used. The following is to explore how much improvement can 
be achieved when we incorporate the proposed estimator of the covariance 
matrix of random effects in the latter approach. 

For any function g{-), if g{-) is an estimator of g{-), the mean integrated 
squared error (MISE) of g(-) is defined as 

MISE = j {g{u) - g{u)Y du. 

Let MISEi be the MISE of the estimator of the functional coefficient based 
on working independence, and MISE2 the MISE based on the weighted least 
squares approach, incorporating the proposed estimator of the covariance 
matrix of random effects. We use IMP = (MISEi -MISE2)/MISE2 to denote 
the improvement due to the weighted least squares approach. 

We have computed IMP when the number of knots in the B-spline de- 
composition is greater than 7 and less than 15, the choice being made on 
empirical grounds. In fact, we found that the MISE of the estimators based 
on either the weighted least squares approach or working independence is 
much smaller when the number of knots lies in this range than when it lies 
outside this range. The obtained IMPs are presented in Table 2, which sug- 
gests improvement across all cases and by a substantial margin for the larger 
number of knots. 

Finally, another interesting question is whether the proposed estimation 
still works when the model is misspecified. We have conducted some inves- 
tigation, leaving a systematic study to the future. We simulated data from 
the model 

Vij = Xijiai{Uij) +Xij2a2{Uij) + gi{ziji)eii + g2{zij2)ei2 + £ij, 

j — 1, . . . j'T'i, i — 1, . . . ,771. Xij — {Xij\,Xij2^ , Zij — {Zijl, Zij2') , — (^jl, 

612)^) and Uij and £ij are simulated in the same way as before, nj and m 
are also set in the same way as before. We set gi{z) = z + 0.1sin(2;) and 
5,2 (z) = z + 0.1sin(2;). We stiU set ai(f/) = sin(27rC/) and 02 (f/) =cos(27rC/). 
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Notice that tiji = gi{ziji) cannot be treated as a covariate because gi{-) 
is treated as unknown. The same remark apphes to 52(21^2). So, the model 
(1.3) is not the true model and we have a misspecified case here. 

The proposed estimation is employed again to estimate the aij and cr^. 
The MSEs of the obtained estimators are 0.006 for a^, 0.104 for an, 0.076 
for (T12 and 0.098 for o"22- This suggests that the proposed estimation is 
robust against a mild degree of misspecification. 

5. Real data analysis. The data come from the Bangladesh Demographic 
and Health Survey (BDHS) of 1996-1997 (Mitra et al. [23]), which is a 
cross-sectional, nationally representative survey of ever-married women aged 
between 10 and 49. The analysis is based on a sample of 8189 women nested 
within 296 primary sampling units or clusters, with sample sizes ranging 
from 16 to 58. We allow for the hierarchical structure of the data by fitting 
a two-level model with women at level 1 nested within clusters at level 
2. In the multilevel model, cluster-level random effects allow for correlation 
between outcomes for women in the same cluster. A further hierarchical level 
is the administrative division; Bangladesh is divided into six administrative 
divisions which are Barisal, Chittagong, Dhaka, Kulna, Rajshahi and Sylhet. 
Effects at this level are represented in the model by fixed effects since there 
are only six divisions. 

The dependent variable, yij, is the duration in months between marriage 
and the first birth for the jth woman in the ith. cluster. A small number of 
women (0.6% of the total sample size) reported a premarital birth, and these 
are excluded from the analysis. When a woman was asked for the date of her 
first marriage in the BDHS, the intention was to collect the age at which she 
started to live with her husband. However, it is likely that some older women 
reported the age at which they were formally married, which in Bangladesh 
can take place at a very young age and some time before puberty (Mitra 
et al. [23]). For this reason, we calculate the first birth interval assuming a 
minimum effective age at marriage of 12 years. The youngest age at first 
birth in the sample was 12 years and this is assumed to be the youngest age 
at which a woman can give birth. 11.53% of women in the sample had not 
had a birth by the time of the survey and are therefore right-censored. 

We consider several covariates which are commonly found to be associated 
with fertility in Bangladesh. The selected individual-level categorical covari- 
ates (Zij) include the woman's level of education (none coded by 0, primary 
or secondary plus coded by 1), religion (Muslim coded by 1, Hindu or other 
coded by 0) and age at first marriage in years. Another individual-level co- 
variate is year of marriage {Uij). We also consider two cluster-level variables, 
administrative division and type of region of residence (urban coded by 1, 
rural coded by 0). We take Barisal as the reference and the differences among 
the six administrative divisions are modeled by a set of dummy variables. 
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We take urban as the reference and the differences between urban and ru- 
ral clusters are modeled by dummy variables, is the vector of these six 
dummy variables. 

Typically, there are two ways to analyze right-censored data. One is the 
likelihood function approach based on Cox proportional hazard function 
(Cox [4]); another is the regression approach based on an unbiased transfor- 
mation (Fan and Gijbels [6]). In this paper, we adopt the latter approach. We 
recover the censored i/ij by the unbiased transformation proposed by Fan 
and Gijbels [6] first, then let Xij = (Z'ljj^J)'^ and employ (1.3) to fit the 
transformed data. The proposed estimation procedure is used to estimate 
the impacts of the covariates as well as the correlations of these impacts. 
The results obtained are presented in Figure 1 and Table 3. 

Table 3 shows that the correlation between the impact of age (of the 
woman at first marriage) and the impact of education is negative. This im- 
plies that the impact of age on the first birth interval would be weak in areas 
where the education level is high. The impact of age and the impact of reli- 
gion are strongly negatively correlated. This suggests that the impact of age 
on the first birth interval is also very weak in areas which are predominantly 
Muslim. The correlation between the impact of education and the impact of 
religion is also negative. This suggests that education would not have a big 
impact on the first birth interval in areas which are predominantly Muslim. 

From Figure 1, we can see the trend of length of the first birth interval 
is decreasing with time. This is attributed to a successful national family 
planning program (see, e.g., Cleland et al. [2]), which increases the age at 
first marriage. A nationally representative survey of women in 1996-1997 
(Mitra et al. [23]) found that the median age at marriage was 13.3 years 
among respondents aged 45-49 at the time of the survey, compared to 15.3 
years for respondents aged 20-24. 

The impact of the woman's age is negative and decreasing with time. The 
impact of the woman's education is negative until around 1984. Before 1984, 
the longer birth intervals among women with no education may be partly 
explained by the higher frequency with which these women report their age 
at formal marriage rather than their age at cohabitation. Calculating the 
duration to the first birth from an origin of age 12 for these women may 
have artificially infiated the lengths of their birth intervals. 

Urban impact is negative before 1959, and is getting smaller with time 
after 1959. This is because at earlier times, some women in rural areas 
got married very early. There was a considerable time period between their 
formal marriage and their age at cohabitation. Such cases are getting fewer 
with time. The impact of following the Muslim religion is always negative, 
and was decreasing sharply from 1955 to 1968, after which it stayed steady. 
This suggests that Muslims tend to have significantly shorter first birth 
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Fig. 1. The impacts of the covariates which are commonly found to be associated with 
fertility in Bangladesh on the length of the first birth interval. 
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Table 3 

The correlation between the impacts of the covariates 





AGEIMAR 


EDUC 


REL 


AGEIMAR 


1 


-0.180 


-0.934 


EDUC 


-0.180 


1 


-0.171 


REL 


-0.934 


-0.171 


1 



AGEIMAR is the impact of the age of the woman at first marriage in years, EDUC is the 
impact of woman's education and REL is the impact of the woman's religion. 



intervals than others before 1968; after 1968, they stih tend to have shorter 
first birth intervals than others but not as significantly. 

Looking at the impact of the division, it is noticeable that the intervals 
are shorter in Chittagong than in the other divisions. This regional effect 
is as expected and is most likely explained by lower contraceptive use in 
Chittagong (the most religiously conservative part of Bangladesh) compared 
to other divisions. Moreover, the impact of the division clearly varies with 
time. 

APPENDIX 

Let V = {{Uij,Xij, Zij) : j = I, . . . ,ni,i = 1, . . . , m}, and we use (C/, X, Z) 
to represent its population. Further, we write Qi{u) = E{XX'^\U = u). 

The following technical conditions are imposed to establish the asymptotic 
results: 

(1) Eefi < oo, i?||ei||^ < oo, Exf^ < oo and Ezj^ < oo, where ||ei|| = {ejei)^/"^, 
Xi denotes the ith. element of X and zj denotes the jth element of Z for 
s>2, i = l,...,p, j = l,...,q. 

(2) (■) is continuous in a neighborhood of u for j = 1, . . . ,p, where flj (•) 
is the jth element of a"(-). Further, assume a'-{u) ^ 0. 

(3) The marginal density /(•) of U has a continuous derivative in some 
neighborhood of n, and f{u) ^ 0. 

(4) rij{u), I3ii{u) and ^ij{u) are continuous in a neighborhood of n, where 

rij{u) = E{xiXj\U = u), j3ii{u) = E{xiZi\U = u), 

Jij{u) = E{xiZ'^T,Zxj\U = u) for i,j = 1, . . . ,p,l = 1, . . . ,q. 

(5) The function K{t) is a density function with a compact support. 

(6) h—>-0, nh? oo and nh^ is bounded. 

(7) There exists a sequence of positive real numbers M„ such that M„ oo 
and 

n-iM max 5](ZT(zTz,)-^^.s)' ^ 0. 
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For easy reference, we first present some useful lemmas. 
Lemma A.l. Let 

n n 
i=lj=l 

be a quadratic form in independent random variables Xi [E{Xi) = 0, E{Xf) = 
1], with Ai, . . . , A„, the eigenvalues of the symmetric matrix (aij), with an = 
for all i. We denote cr^ = E(T^^-^). Suppose that there is a sequence of real 
numbers K{n) such that 

(a) Er(n)^o-~^maxi<j<„(X;i<j<„aij) ^ 0, n^oo, and 

(b) maxi<j<„(£'[X?/||jjc.|>^(„)j]) ^ 0, n — > oo, and that the eigenvalues 
of the matrix (aij) are negligible: 

(c) fT~^ maxi<i<„(Af ) 0, oo; then a~^T(^n) f^o.s an asymptotic N{0, 1) 
distribution. 

See Commenges and Jacqmin-Gadda ([3], Theorem 1). 

Lemma A. 2. Let {Xi,Yi), . . . ,(Xn,Yn) be i.i.d. random vectors, where 
the Yi^s are scalar random variables. Assume further that E\Y\'^ < oo and 
supj. / \y\^f{x, y) dy < oo, where f denotes the joint density of (X, Y) . Let K 
be a bounded positive function with bounded support, satisfying a Lipschitz 
condition. Then 

= Op[{nh/\og{l/h)}-'/^] 

provided that ri^'^~^h oo for some e < 1 — s~^ . 

This follows immediately from the result obtained by Mack and Silverman 
[22]. 

As we need the result of Theorem 2 to prove Theorem 1, we prove Theorem 
2 first. 

Proof of Theorem 2. On using similar arguments as in Fan and 
Zhang [11], the asymptotic conditional bias and covariance of SL{Uij) are 
equal to 

(A.l) hms{k{U,,)\V} = -h^ ^^^^~i a^^([/,,)(l + op(l)) 

^ 1^-01^-2 — fJ-l 

and 



sup 



n 



' J2{KhiX, - x)Yi - E[Kh{Xi - x)Yi]} 



(A.2) 



coY{k{Uij)\V] = Op{{nh)-^). 
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Let Qi = Irii — Pi, and let Qi be a diagonal matrix generated from the diag- 
onal elements of Qi. Now 



lib -j^ //t 

s^liQi - Qi)si + ^iQi 

n — qm ~^ n — qm ~^ 



1 m 

n — qm ^ 

^ 1=1 

2 

+ - - E[{^i - ri)\V]fQ,E[{^i - ri)\V] 

n — qm ~^ 

^ m 



(A.3) 



xQi{{ri-r,)-E[{ri-ri)\V]} 



n — qm . , 
^ 1=1 



n ni 



n - qm .^^ 
2 

+ E ^iQ^ii^i - rO - E[{^i - r,)\V]} 

n — qm ~^ 

= Jnl + Jn2 + Jni + JnA + -AiS + Jn6 + </ri7- 

As is an idempotent matrix and all the diagonal components of Qi — Qi 
are equal to zero, by straightforward calculation it follows that 

2 rn 

E{Jnl \V) = E - = 0' 

n — qm 

^2 m 



E{Jn2\D) = E = 

n — qm 



i=l 



C0V(J„1, J„2|P) = E{JnMT^) = 7 To Etl'((Qi - Qi)Qi) = 0' 

[n — qmy ^ 



var(Jni|P) = <^ ^ k 

n — qm y n — qm ) 

By the law of large numbers, it can be easily verified that 

nvar(J„2|I') var(ef]^)7Ci, nvar(J„i|D) 2cr^(l + 7)01. 
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Since the eigenvalues of an idempotent matrix are either 1 or 0, by Eefi < oo, 
condition (7) and Lemma A.l, we obtain that 

ni/V„i-^iV(0,2a^(l + 7)ci). 

As Qi is a diagonal matrix, J„2 is a sum of independent variables. By 
Eefi < oo and condition (7), it follows from the Lindeb erg-Feller theorem 
that 

n^^\Jn2-(7^)^N{0,Yaiiel^hci). 
Since the two terms are uncorrelated, we have that 

(A.4) n^/\Jni + Jn2 - N{0, 2a\l + 7)01 + var(e2^)7Ci). 

It follows from (A.l) that 

-j^ m rii-q rii rii 
-^"3 = — — E E Y.JlQikjQiksXjjhms{k{Uij)\V} 
<i"''' i=l k=l j=ls=l 

(A.5) xX,^bias{a(^7,,)|P} 

,2\ 2 



7^ 2 ^l + op(l), 



where 



QirlQisl ~ '^'■^ ~ I Q r 7^ S 

In the following, we will show that the remaining parts 7^4 to J„7 in (A. 3) 
satisfy n^^^Jni = op(l), / = 4, . . . , 7. 

By (A.l), (A. 2) and the law of large numbers, we have 

E{\JnAm 



(A.6) 



(n - qm)\nofJ.2 - 

{m rii—q m rii 
i=l k=l j=l s=l 

xE[\{r,,-r,,)-E[{r,^-r,^)\V]\\V] 



X (l + op(l)) 

^ 7 2\y^ + ^p{^)) 

[n - qm)\iiQiX2 - /^i 
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rii 



1/2 



1=1 



= Op((n-i/i3)V2). 

By the inequality ab < 2(0^ + ft^), jy^^^ Qf^^ = 1, for r = 1, . . . , (n^ 
(A. 2), it follows that 



and 



(A.7) 



i^{| Jn5| 1^} < E E(^i - cov(a(C/,,)|P)X,, 

n — am ■' 



= Op((n/i)-i). 

Using (A.l) and the inequality rjjQirji < rjjrj^ due to Qi being an idempo- 
tent matrix, we have 



(n — gm)2 \ 

Therefore 
(A.8) 



2 N 2 m rii 



E E ^.Ta"(f/.,)a"(t/.,)^X.,(l + op(l)). 



As 

</n7 ■ 



m ri;— (J Hi rii 



~Z E E ^^QikjQiksK^ij - T^ij) - E[{rij -rij)\V]}£is, 

^,^1 j = i 



it follows from straightforward but tedious calculations and Lemma A. 2 that 
{{rij - Tij) - E[{rij - rij)\'D]}eis 



X < -V2 



E E ~ Uij){Zrier +eri)eis 

r=l 1=1 



'^^XrliUrl - Uij)Kh{Url - Uij){Zrier + £rl)ei 
lr=l 1=1 



X (l + Op(l)). 

By boundedness of the kernel function, independence of random errors and 
random effects, we have that E{j'^-j\V) = Op{{nh)"'^), that is, 

(A.9) J„7 = Op((n/i)-i). 
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Combining (A.3)-(A.9) and condition (6), we obtain that 

I 4 \H0fJ.2- fJ-lJ J 



n 

(A.IO) 



-^iV(0,2a^(l + 7)ci+var(e2j7Ci). □ 



Proof of Theorem 1. Using standard arguments as in the proof of 
Theorem 2 and (A.l), the conditional bias of S is 

bias{vec(S)|P} = f ^^^^ ~ V{vec(^2) - vec(Si)6}(l + op(l)) 
4 V^o/^2 — A^i/ 

+ Op((n/i)-i), 

and by straightforward but tedious calculation, the Lindeberg-Feller theo- 
rem and condition (7), it follows that 

n^/2{vec(S - S) - bias[vec(S)|P]} ^ (0, Aca), 

where 

A = ^{eie^ ® eie^} - vec(S) vec'^(S) + cj^jS (g) T + T ® S + Ai} 
+ 2a^{A2 - A3 + ci/c2(l + 7)r} + var(e2^){A3 + ci/c2-fT}. 
Therefore, we have 

vech(S - E) - \h'^( ^^^^~^^ V{vechm2)-vech(^i)6}| 

4 \H0H2-fJ'iJ } 

-^N{0,{R^Rg)-^R^ARq{R^R,r^C2). □ 
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